StoRies from Biostatistics to Health Data Science

Eric J. Daza, DrPH, MPS (Twitter @ericjdaza)

30 September 2021 Thursday

Summary

Understanding and shaping the health of populations requires both qualitative and quantitative scientific methods. Statistical concepts are used to structure and manage the uncertainty inherent in the scientific study of human life, behavior, and society. In this workshop, I’ll share some of my experiences as a biostatistician and digital health data scientist—all (well, let’s say 95%) using R.

Agenda

  1. Your Host
  2. Biostatistics and Data Science Settings
  3. Career Paths
  4. Study Designs
  5. Analysis Methods
  6. Wrap Up

1. Your Host

Geography

Training

Career

2. Biostatistics and Data Science Settings

3. Biostatistics Career Paths

Training

Roles

Professional Areas

4. Study Designs

Variables

Designs

5. Analysis Methods: Foundations

“Confusing P-values with Clinical Impact: The Significance Fallacy”

Statistical Hypothesis Testing

5. Analysis Methods: Cross-Sectional Analysis

5. Analysis Methods: Cross-Sectional Analysis

Simulated/Synthetic Example: 12-week RCT of average treatment effect of two drugs on COVID-19 infection

5. Analysis Methods: Cross-Sectional Analysis

tbl_perpid_fancy %>%
  dplyr::select(
    `Patient ID`,
    Arm,
    `Infection Status`
  ) %>%
  dplyr::mutate(
    `Patient ID` = `Patient ID` %>% as.character,
    Arm = Arm %>% as.character,
    `Infection Status` = `Infection Status` %>% as.character
  ) %>%
  head %>%
  dplyr::bind_rows(
    dplyr::tibble(
      `Patient ID` = "...",
      Arm = "...",
      `Infection Status` = "..."
    )
  ) %>% knitr::kable(align = "c")
Patient ID Arm Infection Status
1 Feknuzison Not Infected
2 Remdazavir Not Infected
3 Remdazavir Not Infected
4 Remdazavir Not Infected
5 Feknuzison Not Infected
6 Remdazavir Not Infected

5. Analysis Methods: Cross-Sectional Analysis

tbl_perpid_fancy %>%
  dplyr::select(
    `Patient ID`,
    Arm,
    `Infection Status`
  ) %>%
  dplyr::mutate(
    `Patient ID` = `Patient ID` %>% as.character,
    Arm = Arm %>% as.character,
    `Infection Status` = `Infection Status` %>% as.character
  ) %>%
  head %>%
  dplyr::bind_rows(
    dplyr::tibble(
      `Patient ID` = "...",
      Arm = "...",
      `Infection Status` = "..."
    )
  ) %>% knitr::kable(align = "c")
Patient ID Arm Infection Status
1 Feknuzison Not Infected
2 Remdazavir Not Infected
3 Remdazavir Not Infected
4 Remdazavir Not Infected
5 Feknuzison Not Infected
6 Remdazavir Not Infected
tbl_perpid_fancy %>% fancyTable(c("Infection Status", "Arm"))
n.Feknuzison n.Remdazavir pct.Feknuzison pct.Remdazavir
Infected 27 12 5.4 2.4
Not Infected 473 488 94.6 97.6

5. Analysis Methods: Cross-Sectional Analysis

5. Analysis Methods: Cross-Sectional Analysis

tbl_perpid_fancy %>% fancyTable(c("Infection Status", "Arm"))
n.Feknuzison n.Remdazavir pct.Feknuzison pct.Remdazavir
Infected 27 12 5.4 2.4
Not Infected 473 488 94.6 97.6

5. Analysis Methods: Cross-Sectional Analysis

tbl_perpid_fancy %>% fancyTable(c("Infection Status", "Arm"))
n.Feknuzison n.Remdazavir pct.Feknuzison pct.Remdazavir
Infected 27 12 5.4 2.4
Not Infected 473 488 94.6 97.6
fisher.test(
  x = tbl_perpid_fancy$trt,
  y = tbl_perpid_fancy$infection
)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tbl_perpid_fancy$trt and tbl_perpid_fancy$infection
## p-value = 0.02107
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.1965155 0.8923482
## sample estimates:
## odds ratio 
##  0.4311286

5. Analysis Methods: Cross-Sectional Analysis

logistic regression: \(\text{logit} \Pr(Y=1) = \beta_0 + \beta_1 X\)

5. Analysis Methods: Cross-Sectional Analysis

logistic regression: \(\text{logit} \Pr(Y=1) = \beta_0 + \beta_1 X\)

## 
## Call:
## glm(formula = infection ~ trt, family = binomial, data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3332  -0.3332  -0.2204  -0.2204   2.7312  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.8633     0.1979 -14.471   <2e-16 ***
## trt          -0.8422     0.3529  -2.386    0.017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 329.51  on 999  degrees of freedom
## Residual deviance: 323.35  on 998  degrees of freedom
## AIC: 327.35
## 
## Number of Fisher Scoring iterations: 6

5. Analysis Methods: Survival/Time-to-Event (TTE) Analysis

5. Analysis Methods: Survival/Time-to-Event (TTE) Analysis

Patient ID Arm Week Infection Status at Week
75 Remdazavir 1 Not Infected
75 Remdazavir 2 Not Infected
75 Remdazavir 3 Not Infected
75 Remdazavir 4 Not Infected
75 Remdazavir 5 Not Infected
75 Remdazavir 6 Infected
75 Remdazavir 7 Infected
75 Remdazavir 8 Infected
75 Remdazavir 9 Infected
75 Remdazavir 10 Infected
75 Remdazavir 11 Infected
75 Remdazavir 12 Infected

5. Analysis Methods: Survival/Time-to-Event (TTE) Analysis

5. Analysis Methods: Longitudinal/Panel Analysis

5. Analysis Methods: Longitudinal/Panel Analysis

5. Analysis Methods: Longitudinal/Panel Analysis

5. Analysis Methods: Longitudinal/Panel Analysis

5. Analysis Methods: Missing Data

5. Analysis Methods: Missing Data

5. Analysis Methods: Missing Data

5. Analysis Methods: Advanced

Survey Sampling

Demographics

Bayesian Statistics

Machine Learning

5. Analysis Methods: Advanced

Econometrics, Sociology, Psychometrics

Genomics

5. Analysis Methods: Advanced: Causal Inference

Directed Acyclic Graph (DAG)

5. Analysis Methods: Advanced: Causal Inference

Directed Acyclic Graph (DAG)

RCT / Clinical Trial

DiagrammeR::grViz("
digraph causal {

  # Nodes
  node [shape = plaintext]
  X [label = 'Drug \n (X)']
  C [label = 'Sickness Level \n (C)']
  Y [label = 'COVID-19 Infection \n (Y)']

  # Edges
  edge [color = black, arrowhead = vee]
  rankdir = LR
  X -> Y
  C -> Y
  
}",
width = 800,
height = 270
)

5. Analysis Methods: Advanced: Causal Inference

Directed Acyclic Graph (DAG)

Observational Study / Real World Data

DiagrammeR::grViz("
digraph causal {

  # Nodes
  node [shape = plaintext]
  X [label = 'Drug \n (X)']
  C [label = 'Sickness Level \n (C)']
  Y [label = 'COVID-19 Infection \n (Y)']

  # Edges
  edge [color = black, arrowhead = vee]
  rankdir = LR
  X -> Y
  C -> X
  C -> Y
  
}",
width = 800,
height = 270
)

5. Analysis Methods: Advanced: Causal Inference

Directed Acyclic Graph (DAG)

Observational Study / Real World Data

5. Analysis Methods: Advanced: Causal Inference

Directed Acyclic Graph (DAG)

Observational Study / Real World Data

5. Analysis Methods: Advanced: Causal Inference

https://tinyurl.com/phkfwbz5

5. Analysis Methods: Advanced: Causal Inference

https://tinyurl.com/4zvuat8

5. Analysis Methods: Advanced: N-of-1

Daza EJ, Wac K, Oppezzo M. Effects of Sleep Deprivation on Blood Glucose, Food Cravings, and Affect in a Non-Diabetic: An N-of-1 Randomized Pilot Study. Healthcare 2020 Mar (Vol. 8, No. 1, p. 6). Multidisciplinary Digital Publishing Institute.

5. Analysis Methods: Advanced: N-of-1

Daza EJ, Wac K, Oppezzo M. Effects of Sleep Deprivation on Blood Glucose, Food Cravings, and Affect in a Non-Diabetic: An N-of-1 Randomized Pilot Study. Healthcare 2020 Mar (Vol. 8, No. 1, p. 6). Multidisciplinary Digital Publishing Institute.

5. Analysis Methods: Advanced: N-of-1 Causal Inference

SBM + JSM 2021

6. Wrap Up: References

6. Wrap Up: Causal Inference Books

6. Wrap Up: Acknowledgements and Financial Disclosures

I originally created these slides for the 2020 Pilipinx American Public Health Conference (PAPHC). I thank PAPHC for motivating this presentation, and providing coordination and technical support. This presentation is independent of my work for Evidation Health, where I am a full-time employee.

6. Wrap Up: About the Presenter

Dr. Daza is a data science statistician and digital health data scientist who develops causal inference methods for personal (n-of-1) digital health. He works at Evidation Health (evidation.com). “Significance of evidence is not evidence of significance.” (tinyurl.com/94dc9vn5)

🤓😎🇵🇭

Thank you! Maraming Salamat!

😃😄🙏

Questions?